﻿ Computer Science Journal of Moldova, vol 21, no 1(61), 2013 Towards an Automated Semiotic Analysis of the Romanian Political Discourse Daniela G^fu, Dan Cristea Abstract As it is known, on the political scene the success of a speech can be measured by the degree in which the speaker is able to change attitudes, opinions, feelings and political beliefs in his auditorium We suggest a range of analysis tools, all be- longing to semiotics, from lexical-semantic, to syntactical and rhetorical, that integrated in the exploratory panoply of discur- sive weapons of a political speaker could inﬂuence the impact of her/his speeches over a sensible auditory Our approach is based on the assumption that semiotics, in its quality of methodology and meta-language, can capitalize a situational analysis over the political discourse Such an analysis assumes establishing the communication situation, in our case, the Parliament's vote in favour of suspending the Romanian President, through which we can describe an action of communication We describe a platform, the Discourse Analysis Tool (DAT), which integrates a range of natural language processing tools with the intent to identify signiﬁcant characteristics of the political discourse The tool is able to produce comparative diagrams be- tween the speeches of two or more subjects or analysing the same subject in diﬀerent contexts Only the lexical-semantic meth- ods are operational in the platform today, but our investigation suggests new dimensions touching the syntactic, rhetorical and coherence perspective Keywords:political discourse, natural language processing, president's suspension, lexical-semantic, syntax, rhetorical anal- ysis, coherence of discourse c⃝2013 by D G^fu, D Cristea This work was supported by the POSDRU/89/1 5/S/63663 grant, and the ICT-PSP projects METANET4U #270893 and ATLAS #250467 1 D G^fu, D Cristea 1 Introduction One of the major recent developments in the evaluation of the political language and its related facets (rhetoric, political science, journalism, sociology, etc ) is the increasing attention being paid to the objectivity and relevance of the semiotic dimensions Theoretical approaches in the semiotics of discourses, involving pragmatic aspects (the dynamics of relations between individuals and signs), semantic (conceptual conglomerate met in the meanings of terms), and syntactic (relations between signs) showed a signiﬁcant strengthening after the '80s The current approaches in analysing the political language (the applicative dimension) are based on Natural Language Processing (NLP) techniques designed to investigate lexical- semantic aspects of the discourse The domain of NLP includes a the- oretically motivated range of computational techniques for analyzing and representing naturally occurring texts at one or more levels of lin- guistic analysis for the purpose of achieving human-like proﬁciency in the interpretation of language for a range of tasks or applications In this paper we start by describing a platform, the Discourse Ana- lysis Tool (DAT), which integrates a range of language processing tools with the intent to build complex characterisations of the political dis- course and show how its functionality can be prolonged with more complex features A linguistic proﬁle of an author is drawn by putting together features extracted from the following linguistic layers: lexicon and morphology (richness of the vocabulary, rare co-occurrences, repe- titions, use of synonyms, coverage of verbs' grammatical tenses, etc ), semantics (semantic classes used) and syntax (complexity of syntactic constructions, the frequency of relative clauses, length of the sentences, number of clauses in sentences, subordinate/coordinate structures, fre- quent use of certain type of syntactic relations, etc ) Among the resources used for the study of natural language syn- tax, of a tremendous importance are the treebanks, large collections of sentences annotated by human experts at syntactic structures The collection described in this paper refers to the Romanian language and has been acquired with the help of an interactive graphical tool which 2 Towards an Automated Semiotic Analysis allowed easy annotation, visualisation and modiﬁcation of syntactic trees, initially obtained as a result of an automatic parsing process Our purpose was to develop a computational platform able to oﬀer to researchers in mass-media and political sciences, to political ana- lysts, to the public at large (interested to consolidate their options before any political context analysed), and, why not, even to politi- cians themselves, the possibility to measure diﬀerent parameters of a written political discourse The paper is structured as follows Section 2 shortly describes the previous work Section 3 discusses a number of lexical-semantic, syntactic, rhetorical and pragmatic features on which an automatic analysis is capable to manipulate values in view of drawing statistics Section 4 presents a platform for multi-dimensional political discourse analysis Section 5 discusses an example of comparative analysis of discourses collected during the presidential crisis of July 2012, when the Parliament voted in favour of suspending the Romanian President Finally, section 6 highlights interpretations anchored in our analysis and presents conclusions 2 Previous work The aim of an interdisciplinary approach such as analysing the lan- guage of political speeches is to deﬁne and explain diﬀerent discursive contexts, in this case, reﬂected by the online media The studies in this direction have mainly concentrated on three tasks: the ﬁrst had to do with a cognitive side and, often, with an emotional side, of how humans acquire, produce, and understand language The second aimed at un- derstanding the relationship between the linguistic utterance and the world, and the third { at understanding the linguistic structure of the language as a communication device Linguistics has usually treated language as an abstract object which can be accounted for without reference to social or political concerns of any kind As we will see, one aspect of the platform that we present touches a lexical-semantic functionality, which has some similarities with the ap- proach used in Linguistic Inquiry and Word Count (LIWC) There 3 D G^fu, D Cristea are, however, important diﬀerences between the two platforms LIWC- 2007 is basically counting words and incrementing counters associated with their declared semantic classes In the lexicon, words can be given by their long form, as a complete string of characters, or by their roots For each text in the input, LIWC produces a set of tables, each display- ing the occurrences of the word-like instances of the semantic classes deﬁned in the lexicon, as sub-unitary values For each semantic class, such a value is computed as the number of occurrences of the words corresponding to that class divided by the total number of words in the text It remains in the hands of the user to interpret these ﬁgures Also, LIWC has no support for considering lexical expressions A previous version of DAT performs part-of-speech (POS) tag- ging and lemmatization of words The lexicon contains a collection of lemmas (9 000) having the POS categories: verb, noun, adjective and adverb In the context of the lexical semantic analysis, the pronouns, numerals, prepositions and conjunctions, considered to be semantically empty, have been left out In contrast with LIWC-2007, which includes 64 semantic classes (classiﬁed into 4 categories: linguistic { 22 classes, psychological { 32 classes, socio-professional preoccupations { 7 classes and paralinguistic { 3 classes), DAT v3 works with 33 semantic classes, out of which 5 are newly introduced, chosen to ﬁt optimally with the necessities of interpreting the political discourse The second range of diﬀerences between the two platforms re- gards the user interface In DAT, the user is served by a friendly interface, oﬀering a lot more services: opening one or more ﬁles, displaying the ﬁle(s), modifying/editing and saving the text, func- tions of undo/redo, functions to edit the lexicon, visualization of the mentioning of instances of certain semantic classes in the text, etc Then, the menus oﬀer a whole range of output visualization func- tions, from tabular form to graphical representations and to print- ing services Finally, another important development for seman- tic approach was the inclusion of a collection of formulas which can be used to make comparative studies between diﬀerent sub- jects The lexicon entries are coded in XML, following one of the patterns: , or , in whichwordStemis the stem of a word (therefore symbols optionally followed by the `*' sign), wordLemmais the lemma of a word, andsemListis a list of semantic classes (each indicated by a unique identiﬁer) The following line shows such an example of lexical entry: 3 Semiotic features of political discourse As meta-language, the semiotics explain the evolution of diﬀerent types of object-languages, from physical to linguistic (among those { the political discourse) It helps to understand the way the humans apply these systems with the intend to \designate states of possible worlds or to criticize and even change the structure of systems" The three analytical horizons are: structural analysis of the levels / hierarchical relations of (macro)sign, the triadic analysis (syntactic, semantic, pragmatic), and the analysis of the communication situation taken for investigation In the following we will focus on one of the three horizons of analysis assumed by the semiotic methodology, namely, on the triadic analysis Conforming to this view, any text/discourse can be analysed from three perspectives : syntactic (the relation be- tween signs), semantic (the relation between signs and reference), and pragmatic (the relation between participants in the communication) Highlighting methodological operations presumed by such a perspec- tive oﬀers as many (re)signifying strategies of political contexts We will adopt analytical techniques developed by the NLP ﬁeld to a semiotic study over political texts, in the classical sense , that go back to the methodology proposed by Ferdinand de Saussure , in order to show that the results can be signiﬁcantly comparable and, therefore, there are good reasons to trust the computational techniques 5 D G^fu, D Cristea 3 1 The lexical-semantic perspective A lexical-semantic perspective is supposed to focus on the following targets: 1 establishing meanings that a political speech includes, as a whole or at the level of its content units (negative/positive, aﬃrma- tive/adversative, etc ); determining the correlation degree (mo- tivation) between the orientation of the political speech and the language (code) used (adequate, partially adequate, inadequate); 2 a qualitative-semantic analysis of content units, that could be operated on two dimensions: denotative (what is said explicitly about the topic discussed), focusing on the intelligibility of the political text, by assessing its lexical-semantic connectedness , or by counting the originality, oddity or banality of the used lex- icon, as well as the phrase length, the number of subordinate sentences, parentheses, etc ; connotative (what are the side sug- gestions, the sayings in-between the lines, the symbolistics of the language used), aiming to highlight the possible hidden semantic meanings of a speech and determining the most likely ones by taking into account all circumstantial factors (situational), and specifying the gap between the explicit and implicit intentions expressed; 3 a quantitative-semantic analysis focusing on determining of the frequency of key concepts encountered in the political text, high- lighting the frequency of certain themes in the speech, identify- ing the frequency of emotionally charged terms, etc ; building a dictionary of symbols (for key-concepts) speciﬁc to the political discourse that helps to frame it in terms of semantic categories 4 a discourse and para-language analysis considering the identiﬁca- tion of the rhetorical aspects of the verbal language (spectacular, suggestive, allusive, emotional, metaphoric factors, etc ), and the characteristics of the nonverbal language which have a signiﬁcant weight in the political discourse 6 Towards an Automated Semiotic Analysis The political speaker is determined to collect empathy and to con- vince the public Yet, placing himself within the general limits of the political goals, very often a skilful politician studies the public for ﬁx- ing the type of vocabulary and the message to be delivered He might exploit connections between more daring ideological categories (as is for instance the classnationalism) and those generally accepted (for instance, belonging to the classessocial,work,home) The present day political language puts in value the virtues of the metaphor, its qualities to pass abruptly from complex to simple, from abstract to concrete, imposing a powerful subjective and emotional dimension to the discourse (the classemotional) The political metaphor may loose the virtues of poetical metaphor, becoming injurious (the classswear) 3 2 The syntactic perspective Regarded as one of the most developed branches of semiotics, syntactic analysis aims at studying the relations between signs and the logical and grammatical structure at the sentence level The sentence is composed out of an ordered sequence of language signs, which are governed by a set of combinatorial rules From this perspective, the syntactic analysis of a text aims at: seg- menting the text onto information units (sentences, clauses, phrases, words and punctuation markers), identifying the constituency struc- ture of the sentence (recurrent levels of constituency), emphasising the dependency structure of a sentence (putting in evidence the unique syntactic head of each word and the relation linking it to its head in a tree-like dependency structure , etc The syntax may reveal the level of culture, intentional persuasive attitudes towards the public, iritation or rude passion during the production of speech, etc Then, a combination of syntactic and semantic means of investiga- tion could bring forward the semantic verbal roles in sentences (see, FrameNet ), as well as the balance between given and new or rheme and theme The ﬁnal goal of a combined syntactic-semantic analysis is the in- ference of a logical-form of the sentence, which would give a formal 7 D G^fu, D Cristea expression of the content 3 3 The discourse-level perspective Beyond the sentence, at the discourse level, a rhetorical analysis iden- tiﬁes relations or interdependencies holding between adjacent spans of text Then, the arguments of a relation (discourse units, or spans of text) could be compared one to the other in terms of their importance (nuclearity) The rhetorical relations and their nuclearity are grouped in rhetorical schemes, as general patterns in which spans of text can be recurrently analyzed The main regard of discourse theories are towards explaining the structure of a text (how is a text organised in segments and these ones { in sub-segments, and how this compositional structure inﬂuences the comprehension of the meaning), its degree of diﬃculty (for instance, why are certain texts easier to interpret than others ), its cohesion (or what makes that diﬀerent components of a text look like being glued together ) and coherence (\Intuitively, coherence is a seman- tic property of discourse, based on the interpretation of each individual sentence relative to the interpretation of other sentences " ), and, ﬁnally, what is the relationship between coherence, cohesion and dis- course structure Summarisation issues are nonetheless immerged onto a discourse-level analysis 3 4 The pragmatic perspective The pragmatic analysis should be based on the knowledge of the po- litical intentions (of both the speaker, and the receiver) in connection with the ideological meanings of a speech Only in good knowledge of the political aspirations of the hearers and knowing that the speaker knows himself this spectrum of political aspirations, a human analyst would succeed in interpreting the whole range of subtleties of a po- litical speech It is clear that pragmatics makes a good deal of the political speeches interpretation process It is nevertheless true that an experienced human analyst would succeed to acquire these facets of the pragmatic context of a political speech even having little direct 8 Towards an Automated Semiotic Analysis knowledge on them It is like in an act of reverse engineering in which the analyst is able to infer the political ideology of the speaker and of the auditorium from the speech itself A closer look on a pragmatic analysis of a political discourse re- veals the following aspects: interpretation of the text in terms of psy- chological distance between the partners, opponents, etc ; deﬁning the transmitter's political attitude before and after the communication; de- termining the receptor's political attitude (i e being pro, against or undecided); pursuing echoes of the political communication in the au- dience (immediately), or in the society (after a delay), etc ; discovering the political speaker's intentions by evidencing the semantic roles of diﬀerent sentence constituents (reiterations, expressions, etc ) 4 A platform for multi-dimensional political discourse analysis In this section we brieﬂy describe the Discourse Analysis Tool (DAT), a platform which integrates a range of language processing tools, with the intent to build complex characterisations of the public discourse Out of the discussed perspectives of semiotic analysis, DAT (currently at version 3) implements only lexical-semantic features The concept behind the lexical-semantic analysis in DAT is that the vocabulary used by a speaker opens a window towards the author's sensibility, to- wards his/her level of culture, her/his cognitive world Some of these means of expression are persuasive, aimed to convince the public on the own opinions, while others are manipulative, aimed to induce a false perspective on an issue Figure 1 displays a snapshot of the in- terface showing a semantic analysis, during a working session The platform incorporates two alternative views for presenting the results of the lexical-semantic analysis: graphical (pie, function, columns and areas) and tabular (Microsoft Excel compatible) The vocabulary of the 33 semantic classes (detailed in Figure 2) of DAT v3 are considered to fulﬁl optimally the necessity of interpreting the political discourse of our corpus 9 D G^fu, D Cristea Figure 1 The DAT interface: in the left window the selected ﬁles appear, in the middle window { the text in the selected ﬁle, and in the right window { information about the text (language, word count, dominant class, etc ) Below, a plot (form chosen from a range of graphical tools) is displayed By selecting a speciﬁc class in the middle window, all words assigned to that class are highlighted in the text 10 Towards an Automated Semiotic Analysis Figure 2 Semantic classes in DAT v3 11 D G^fu, D Cristea Our interest went mainly in determining those political attitudes able to inﬂuence the voting decision of the auditorium But the system can be parameterised to ﬁt also other conjunctures As such, the user can deﬁne at will her/his semantic classes, which, as can be noticed in Figure 2, are partially placed in a hierarchy The development of the lexicon associated with these classes was done in several phases We started with a small vocabulary (mainly looking for translation equivalents in Romanian of a subset of the LIWC-2007 classes) Then, the words of this initial lexicon have been used as seeds in a trial to enrich the lexicon automatically by using the morphological database of DEX-online, an online electronic dictionary for Romanian language To prepare the integration of syntax in DAT, a dependency parser for Romanian is in the process of being trained on a dependency tree- bank This corpus of syntactic trees (incorporating now over 4,000 tree structures) has been partially developed manually, by using a graphical editing tool (TreeAnnotator) and, later on, by the dependency parser itself, in a bootstrapping manner After the corpus reached the size of 100 structures, the development of the resource continued in a boot- strapping manner: the new sentences belonging to the interim president were ﬁrst parsed by the parser and then manually corrected by the ﬁrst author of this paper This way, the development of the corpus gained very much in speed The format of the stored trees is XML, with the following elements: •sentence{ marking the sentences; its attributes are: a unique identiﬁer and the name of the annotator who lastly worked over the sentence; •word{ marking individual words of the sentence; its attributes are: a unique identiﬁer, the morphological tag, the lemma form of the inﬂected word, the ID of its parent word (the head in the dependency structure) and the name of the relation linking the word to its parent The following version of DAT is planned to integrate also a syntactic parser, oﬀering to the user the possibility to identify and count relations 12 Towards an Automated Semiotic Analysis between diﬀerent parts of speech, to put in evidence patterns of use at the semantic and syntactic level, discursive behaviours, etc 5 A comparative study 5 1 The corpus The corpus used for our investigation was conﬁgured to allow a compa- rative study over the discursive characteristics of three political leaders, Traian Basescu, Crin Antonescu, and Victor Ponta Traian Basescu was the Romanian's president since 2004 (with an interruption in the summer of 2012, when he was suspended, period monitorized in this study), one of the most complex personalities of the Romanian political arena of the last decade The second political actor, Crin Antonescu, is an ex-leader of the Liberal Party, for a short while { President of the Senate and then { the Romania's interim President (during Basescu's suspension) The last political actor, Victor Ponta, is an ex-leader of the Social Democrat Party, the actual Romanian prime minister, and represents the new political generation His party and Antonescu's party form the USL coalition (The Social-Liberal Union) This coali- tion, with a social-liberal ideology is a premiere in Romania We are, this way, putting on the balance three styles of political discourse that, at least in principle, are perceived as being diﬀerent as ideologies (democrat-liberal, liberal, and social-democrat) But more than comparing political discourses belonging to diﬀerent ideologists, the year 2012, so politically dense, oﬀers the opportunity to study how the stress of the political battle from the edge of a crises is reﬂected in these major opponents' speeches, as evidenced by a semiotic anal- ysis Indeed, 2012 was the year of governmental changes in Romania After the January street protests and following President Basescu's re- quest, the Boc Government resigns (20 January) and is replaced by the Ungureanu Government (6 February) Permanently contested and sanctioned by the public opinion, less than 3 months later, the Ungu- reanu Government falls, following a vote of conﬁdence from the Parlia- ment, put forward by the opposition block PSD-PNL-PC (27 April) 13 D G^fu, D Cristea The President will designate a new premier, Victor Ponta, the head of the principal opposition party, PSD, sustained by Crin Antonescu, the liberals' head The two politicians make the bases of a new coali- tion, USL, whose principal objective is the removal of the President, following thus one of the demands of the protestants On 10 June, the local elections will completely change the political map of Romania: the governmental coalition becomes legitimate in the principal cities and districts of the country The next step will be the relegation of Basescu, preceded by a motion of censure (6 July), when the President is suspended This will trigger the political crisis, around which our analisys gravitates For the elaboration of preliminary conclusions on the crisis pro- cess, we collected, stored and processed, partially manually, partially automatically, political texts published by three national on-line pu- blications having similar proﬁles A small part of this corpus which includes a collection of 100 political sentences, thoroughly chosen, each containing one or more clauses, has been syntactically annotated 5 2 The lexical-semantic analyses Apart from simply counting frequencies of mentions of semantic classes of one author, the system can also perform comparative studies between two or more authors or for the same author in diﬀerent periods of time To exemplify, we present below diﬀerent charts with two streams of data, representing the political speeches in the context of the political crisis (before Basescu's suspension), belonging to the three important political leaders mentioned above In fact, our analysis makes a two by two comparison of the three political actors mentioned In each of the diagrams that follow, for each semantic class, the values corresponding to one subject are subtracted from the other Our experience shows that an absolute diﬀerence value below the threshold of 0 5% should be considered as irrelevant and is, therefore, ignored in the interpretation For this reason, these classes are not represented in the chart The graphical representation in Figure 3, in which Traian Basescu, President of Romania before the temporary suspenssion (ﬁgured above 14 Towards an Automated Semiotic Analysis the Ox axis) is compared against Crin Antonescu, the President of the Senate at that time (ﬁgured below the Ox axis), should be inter- preted as follows: Traian Basescu was interested more on the labour market in Romania (the classwork), uttered in an intuitive tone (the classintuition), than Crin Antonescu, whose discourse had patriotic accents (the classnationalism), and who developed a comparative analysis between failures (the classfailures) and achievements (the classachievements) during Basescu's presidency Figure 3 The average diﬀerences in the frequencies of all classes (that cumulate more than 0 5 % occurrences) in the political discourses of Traian Basescu and Crin Antonescu, before the initiation of the crisis It is interesting to see how quickly the discursive spectrum changes after Basescu's suspension: in the same day, Basescu becomes negative, and Antonescu positive In fact, a normal attitude as the ﬁrst subject was suspended after the vote of the Parliament, and the second subject will become the interim President, triggered by his quality of President of the Senate 15 D G^fu, D Cristea This new situation is narrated by the chart in Figure 4, which shows again two streams of data belonging to the same subjects, but this time after the moment the crisis erupted (after Basescu's suspen- sion) Our reading of the diagram is as follows: Traian Basescu had a negative tone (the classanger), but he kept a more rational attitude (the classintuition) than Crin Antonescu, who becomes full of hope (the classpositive) and who has a stronger patriotic attitude (the classnationalism) Figure 4 The average diﬀerences in the frequencies of all classes (that cumulate more than 0 5% occurrences) in the political discourses of Traian Basescu and Crin Antonescu, after the initiation of the crisis The inedited element was the absence of Romanian Prime Minister, Victor Ponta, at the meeting of Parliament He preferred to have a short statement after Basescu's suspension It is also interesting to make a comparative radiography of the other two political opponents { Traian Basescu and Victor Ponta in a critical moment, i e immediately after the political crisis has been ﬁred The 16 Towards an Automated Semiotic Analysis chart in Figure 5 compares the political texts of Traian Basescu (above the Ox axis) and Victor Ponta (below the Ox axis) Our reading is the following: Traian Basescu had a negative tone (the classesnegative, andanger), but he kept a rational attitude (the classesrational, and intuition), while Victor Ponta was satisﬁed with the results (the class positive) Figure 5 The average diﬀerences in the frequencies of all classes (that cumulate more than 0 5% occurrences) in the political discourses of Traian Basescu and Victor Ponta, after the initiation of the crisis One of the interesting studies which we have in attention is to per- form comparative studies for the same political actor in diﬀerent peri- ods of time, in our case, before and after the initiation of the crisis that resulted in the Romanian President's suspension For exempliﬁcation, we have chosen Basescu's speeches The graphical representation in Figure 6, in which the President Traian Basescu's speech (above the Ox axis) is compared against the suspended President Traian Basescu's speech (below the Ox axis) 17 D G^fu, D Cristea should be interpreted as follows: before his suspension, the subject accentuated more on social aspects, his discourse was positive and in- sisted on the achievements On the contrary, after being suspended his discourse became emotional, negative, with eruptions of anger and sadness, while still preserving a rational tone For instance, before his suspension, Basescu used expressions such as: \se pare ca eu nu reusesc" (it seems that I don't succeed), \dec^at atingerea scopurilor politice" (other than attaining political purposes), \Eu cred ca este o greseala" (I consider being a mistake), etc After president's suspen- sion, Basescu changed the discursive tone preferring expressions, such as: \^n concluzie, mergem la Referendum" (in conclusion, we're go- ing to Referendum), \dar, sa vedem" (but let's see), \Dar ^nainte de a merge la referendum" (but before going to Referendum), etc Figure 6 Basescu's versus himself, before and after the suspension 18 Towards an Automated Semiotic Analysis 5 3 The syntactic analyses In order to proceed with a syntactic level investigation, the text bo- dies have been pre-processed automatically by an NLP processing ﬂow that included: sentence splitting, tokenisation, part-of-speech tagging and lemmatisation Then, two thirds of the corpus were automatically parsed at the FDG structure, and the remaining part was manually annotated using the TreeAnnotator interface Both resulted in heavily annotated XML ﬁles Table 1 shows the size of the corpus used in these syntactic analysis Table 1 The corpus of texts annotated for syntax in Crin Antonescu's speeches Number ofNumberNumber ofNumber of sentencesof wordsannotatedwords in the sentencesannotated sentences 1233,9601003,286 We concentrated our analysis on three types of syntactic relations that we believe have a rhetoric role in the crisis context: adjectival, ap- positional and anacoluthic (Table 2 displays absolute and relative values for all types of relations) Note that none of these relations are compulsory in the syntax of the phrase (the same as with the overtly expressed pronouns on the position of subject, in Romanian, for in- stance) Even more than that, the anacoluthic constructions are con- sidered errors in a cultivated speech, although, properly mastered, they could have rhetorical value Therefore, the use of all these relations is strictly a matter of personal choice The adjectival structure (marked asa adj,a subst,a vband a advin Table 2; 19 5% of all relations in the corpus) means adjecti- val, nominal, verbal and adverbial attributes: the adjectives add colour to the discourse The orator not only that brings a contextual, albeit new, information, but enhances the enouncement by detailing it and developing it The adjectival group is usually part of the rheme (the 19 D G^fu, D Cristea Table 2 Occurrence of dependency relations for Crin Antonescu's political speeches corresponding to the crisis context RelationNumberPercentage coord 28611 1% prep 32012 4% a adj 1566 0% c d 1987 7% punct 1003 9% sbj 963 7% part 1204 6% c i 762 9% a subst 1987 7% a vb 1124 3% det 903 5% c c m 983 8% n pred 602 3% aux 843 3% a adv 401 5% reﬂ 1204 6% anacol 983 8% c c t 401 5% neg 803 1% ap 1023 9% c c l 461 8% comp 401 5% c c scop 240 9% Total2584100 20 Towards an Automated Semiotic Analysis new information), not the theme (the old), being placed (in Romanian) usually after the theme element When it is placed in the thematic po- sition it's role is emphatic, usually associated with a particular tone, but, generally, it does not change the content of the message The re- lation reveals a certain taste for belletrist culture from the part of the author In Figure 7 the arrows highlight the presence of two adjectival structures: \Rom^anie adevarata" (Real Romania), \Rom^anie normala" (normal Romania) Figure 7 An adjectival structure on a dependency tree visualised with TreeAnnotator The apposition structure (ap in Table 2; 3 9%): this is the depen- 21 D G^fu, D Cristea dency relation that holds between two lexical sequences, called base and apposition (the apposition being open to an unlimited number of terms), the second one giving a complementary information on the ﬁrst one The apposition structure should be delimited from the syntactic relations of subordination and coordination, because between the base and the apposition there is no syntactic hierarchy However, by conven- tion, in our dependency structures, the appositive term is represented attached to the base Figure 8 An apposition structure visualised with TreeAnnotator In Figure 8, the arrow highlights an apposition structure The sen- tence \Rom^anii se aruncara cu entuziasm " (The Romanians jumped with enthusiasm ) is interrupted by the apposition \setosi de a re^nvia la o viata noua" (approx thirsty to be reborn in a new life), which add contextual information to the main subject \Rom^anii" (Romanians) 22 Towards an Automated Semiotic Analysis The anacoluthic structure (anacol in Table 2; 3 8%) marks an in- terruption of a syntactic construction (clause, phrase) and continuation with another construction In general, the anacoluthon is considered an error in the grammar books So, strictly grammatical it is forbidden To evidence it automatically in texts is extremely diﬃcult because it is rare and a parser needs many occurrences in order to develop the ability to put it in evidence In long sentences it is diﬃcult even for an ex- perienced annotator to note these intentional (or unintentional) errors, because the interspersed components have such diverse structures In the example in Figure 9, the principal sentence \Dupa d^ansul, veni mai t^arziu Regulamentul" (After him, the Regulation came later) is followed by the anacoluthon \caci el" (because it), which represents a suspended nominative (nominativus pendens) relation The author feels the need for a change in the discourse theme, after upgrading the nominative \el" (it), seeming to have the function of subject near a predicate which is never uttered afterwards The experienced political actors use anacoluthic structures strategically in communication with the intend to focus the discourse or to highlight a particular element In this example, the politician focuses on \Regulamentul" (the Re- gulation), and the subordinate concessive sentence \desi fu impus de straini" (although having been imposed by foreigners) Figure 9 An anacoluthic structure, visualised with TreeAnnotator 23 D G^fu, D Cristea 5 4 The rhetorical and pragmatic analysis As mentioned, a discourse-level type of analyses should reveal elements of coherence and cohesion of the text, together with the identiﬁcation of the rhetorical structure of the discourse Some of these aspects are technologically feasible with diﬀerent degree of accuracy Discourse level techniques are applied at the very end of a long processing chain, which should include: segmentation into sentences, tokenisation, pos- tagging and lemmatisation, and segmentation at the clause level Op- tionally, in a more developed type of analysis, it should also include: shallow parsing (for the identiﬁcation of noun-phrases), name-entity recognition, and anaphora resolution Counting diﬀerent types of rhetorical relations in a political speech could reveal a lot over the rhetorical strategy of the author and the dynamics of the discourse A rhetorical parser is usually trained to recognise complex rhetorical trees out of a corpus manually annotated with these structures The discourse parser developed in the NLP- Group@UAIC-FII builds rhetorical structures based on the identiﬁca- tion of cue words and other discursive features The outputted trees of the current implementation, however, miss the names of the relations, but they can retain the cue-words and the nuclearity Perfectly feasible with the present day technology are also the iden- tiﬁcation of some cohesion and coherence elements of a political speech, as mentioned in Section 3 3 Centering parsers (see , for instance) could measure the coherence of a text on a scale from 0 to 4 Scaling up an exploratory tool for the purpose of our investigation would be an interesting research objective, which should take into consideration that a high quality human discourse is not always one that reaches a maximum on the coherence scale, because that one would also be very boring , the same as it should not be a randomly generated one, be- cause this would be completely incoherent It's a pharmacy chemistry that the great orators know to master, combining in proper quanti- ties, as the discourse unfolds, the fulﬁlment of expectations with the unexpected and surprise Present day techniques make feasible the development of a number 24 Towards an Automated Semiotic Analysis of automatic techniques in the area of rhetorical and coherence anal- ysis It will be our further objective to concentrate on this type of investigation 6 Conclusions The analysis we proposed in this paper aims at verifying if a semiotic perspective anchored in natural language processing techniques could be of value in valuating political speeches If this performance proves to be feasible, than semiotics would become a very applicative science, with interesting virtues in the optimization of the political discourse Rhetorical weapons in the hands of a political actor should be: the diversity of the lexicon and a proper mastering of the semantic classes, the syntactic form, the emancipation of the expression, the coherence and the proper mastering of the comprehensibility It is our conviction that the present day linguistic technology can successfully cover many of these facets However, we are aware that this study only sketches a way to go, and a lot more should be studied until a reliable discourse interpret- ing technology will become a tool in our hands We should also be aware of the dangers of false interpretation For instance, if we take as example the three orators we used in our experiments, diﬀerences at the level of lexicon and syntax, which we have evidenced as diﬀerenti- ating them, should be attributed only partially to their idiosyncratic rhetorical styles, because these diﬀerences could also have ideological roots Moreover, speeches of many public actors, especially today, are the product of teams of specialists in communication and, as such, conclusions regarding their cultural universe, for instance, should be uttered with care It remains yet to be decided the impact that the use of certain syntactic structures, such as adjectival, appositional and anacoluthic, could have over an auditory in the political discourse Diﬀerent politicians could raise the use of these measures to the level of a rhetorical strategy, therefore exploiting them perhaps too much in the beneﬁt of the aimed goals In other words, this study shows that technological instruments are able to detect tendencies of manipulation 25 D G^fu, D Cristea of the receiver with the evident role of detouring the attention of the audience from the actual communicated content in favour of the orator The software allows online editing of a yet-to-be-delivered speech, in order to make it ﬁt to the audience proﬁle (public of large, journalists, diﬀerent levels of culture) Many interpretation facets are pertinent to the speciﬁc context a discourse is being uttered For instance, in a crisis context a political discourse should be evaluated in function of the balance between the agenda of an orator that happens to be on the site of the political power, versus the opposite agenda Diﬀerent intensities of emotional levels could also be evidenced, and we prepare a more ﬁned grade scale of emotional expressions It is a known fact that the audience can be manipulated easily (e g , the classsadness) by political actors when their themes are treated with excessive emotional tonalities We are aware that many technological aspects remain yet to be re- ﬁned and enhanced One of the most important is the determination of the senses of words and expressions in context In the future we intend to include a word sense disambiguation module in order to determine the correct senses, in context, of those words which are ambiguous be- tween diﬀerent semantic classes, or between classes in the lexicon and outside the lexicon (in which case they would not have to be counted) Also, negations could completely reverse the semantic class a certain expression belongs to in certain contexts and need therefore special treatment The collection of manually annotated texts is only at beginning, a starting point for an eﬃcient automatic annotation In the near future we will manually correct all the automatically annotated texts, im- proving thus the behaviour of the parser Another line to be continued regards the evaluation metrics, which have not received enough atten- tion till now We are currently studying other statistical metrics able to give a more comprehensive image on diﬀerent facets of the political discourse We believe that the platform has a range of features that make it attractive as a tool to assist any kind of political campaigns It can be rapidly adapted to new domains and to new languages, and its inter- 26 Towards an Automated Semiotic Analysis face is user-friendly and oﬀers a good range of functionalities It helps to outline distinctive features which bring a new and, sometimes, un- expected vision upon the discursive characteristics of political authors Acknowledgments: In performing this research, the ﬁrst author was supported by the POSDRU/89/1 5/S/63663 grant, and the second author { by the ICT-PSP projects METANET4U # 270893 and AT- LAS # 250467 Alex Moruz helped the ﬁrst author to clean the DAT Romanian lexicon in an early phase Afterwards it has been largelly extended by Radu Simionescu after importing the Romanian morpho- logy from the DEX-online database We are grateful to Catalin Fr^ancu and Radu Borza for oﬀering this database The DAT platform has been developed by Madalina Spataru, as a post-master activity in the Faculty of Computer Science of the \Alexandru Ioan Cuza" University of Iasi All the Romanian NLP components mentioned in this paper were developed in the NLP-Group@UAIC-FII References D Anechitei, D Cristea, I Dimosthenis, E Ignat, D Karagiozov, S Koeva, M Kopec, C Vertan (2013, to appear) Summarizing Short Texts Through a Discourse-Centered Approach in a Multi- lingual Context In Neustein, A , Markowitz, J A (eds ), Where Humans Meet Machines: Innovative Solutions to Knotty Natural Language Problems Springer Verlag, Heidelberg/New York C F Baker, C F Fillmore, J B Lowe (1998) The Berkeley Framenet project In Proceedings of the COLING-ACL 1998, Mon- treal, Canada A Belogay, D Karagyozov, S Koeva, C Vertan, A Przepiorkowski, D Cristea, P Raxis (2012) Harnessing NLP Techniques, in Walter Daelemans, Mirella Lapata Lluis Marquez (Eds ) Processes of Multilingual Content Management, Proceed- ings of EACL 2012 { the 13th Conference of the European Chap- ter of the Association for Computational Linguistics, Avignon, France, April 23-27, pp 6{10, ISBN: 978-1-937284-19-0 27 D G^fu, D Cristea D Cristea, N Ide, L Romary (1998) Veins Theory An Approach to Global Cohesion and Coherence In Proceedings of Coling/ACL '98, Montreal D Cristea, A Iftene (2011) Grounding Coherence Properties of Discourse In ALEAR Final Report, vol II Embodied Cognitive Semantics, Berlin, April U Eco (1996) Limitele interpretarii(Limits of interpretation), Ed Pontica, Constanta Gramatica limbii rom^ane(The Grammar of the Romanian Lan- guage) (2005) Vol II, Enuntul (The statement), Ed Academiei Rom^ane, Bucuresti, 105{113, 619{31, 743{747 D G^fu, D Cristea (2012) Multi-dimensional analysis of polit- ical language, in \Future Information Technology, Application, and Service", in James J (Jong Hyuk) Park, Victor C M Le- ung, Cho-Li Wang, Taeshik Shon (eds ), volume 1/164, Springer Science+Business, Media Dortdrecht B J Grosz, A K Joshi, S Weinstein (1995) Centering: A frame- work for modeling the local coherence of discourse In Computa- tional Linguistics, 12(2), 203{225 Eva Hajicova, B H Partee, P Sgall (1998) Topic{Focus Articu- lation, Tripartite Structures, and Semantic Content In Studies in Linguistics and Philosophy, 71, Dordrecht, Kluwer M A K Halliday, R Hasan (1976) Cohesion in English Long- man, London E D Liddy (2001) Natural Language Processing, in Encyclopae- dia of Library and Information Science, 2nd Ed NY Marcel Decker, Inc W C Mann, S A Thompson (1988) Rhetorical Structure Theory: Toward a functional theory of text organization, in Text 8(3), 243{ 281 D Marcu (2000) The theory and practice of discourse parsing and summarization, The MIT Press, Cambridge, Massachusetts 28 Towards an Automated Semiotic Analysis Ch Morris (1938) Foundations of the Theory of Signs, The Uni- versity of Chicago Press Pennebaker, J W , Francis, Martha E , Booth, R J (2001) Linguistic Inquiry and Word Count "LIWC2001, Mahwah, NJ, Erlbaum Publishers J W Pennebaker, M E Francis, R J Booth (2001) Linguistic In- quiry and Word Count LIWC2001, Erlbaum Publishers, Mahwah, NJ, 2001 H F Plett (1983) Stiinta textului si analiza de text(The science of text and the text analysis), Ed Univers, Bucharest N Rescher (1973) The coherence theory of truth, Oxford UP, London S Romaine (1994) Language in society An Introduction to So- ciolinguistics, Oxford University Press Inc , New York Ferdinand de Saussure (1916) Cours de linguistique generale, Payot, Paris L Tesniere (1959) Elements of structural syntax, Editions Klinck- sieck T Van Dijk (1977) Text and Context Explorations in the seman- tics and pragmatics of discourse, Longman, New York Daniela G^fu, Dan Cristea, Received July 24, 2012 Daniela G^fu "Alexandru Ioan Cuza" University of Iasi Faculty of Computer Science 16, Berthelot St , 700483 Iasi, Romania Phone: +40 232 201724 E{mail:daniela gifu@info uaic ro Dan Cristea "Alexandru Ioan Cuza" University of Iasi Faculty of Computer Science 16, Berthelot St , 700483 Iasi, Romania Phone: +40 232 201542 E{mail:dcristea@info uaic ro Institute of Computer Science Romanian Academy, the Iasi branch 2, T Codrescu St , 700481-Iasi, Romania 29